Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deobfuscation of the code base + pep8 and fixes #481

Closed
wants to merge 307 commits into from

Conversation

hill-a
Copy link

@hill-a hill-a commented Jul 27, 2018

  • Fixed tf.session().__enter__() being used, rather than sess = tf.session() and passing the session to the objects
  • Fixed uneven scoping of TensorFlow Sessions throughout the code
  • Fixed rolling vecwrapper to handle observations that are not only grayscale images
  • Fixed deepq saving the environment when trying to save itself
  • Fixed ValueError: Cannot take the length of Shape with unknown rank. in acktr, when running run_atari.py script.
  • Fixed calling baselines sequentially no longer creates graph conflicts
  • Fixed mean on empty array warning with deepq
  • Fixed kfac eigen decomposition not cast to float64, when the parameter use_float64 is set to True
  • Fixed Dataset data loader, not correctly resetting id position if shuffling is disabled
  • Fixed EOFError when reading from connection in the worker in subproc_vec_env.py
  • Fixed behavior_clone weight loading and saving for GAIL
  • Avoid taking root square of negative number in trpo_mpi.py
  • Removed some duplicated code (a2cpolicy, trpo_mpi)
  • Removed unused, undocumented and crashing function reset_task in subproc_vec_env.py
  • Reformated code to PEP8 style
  • Documented all the codebase
  • Added atari tests
  • Added logger tests

Missing: tests for acktr continuous (+ HER, gail but they rely on mujoco...)

@hill-a hill-a closed this Aug 29, 2018
@pzhokhov
Copy link
Collaborator

Hi @hill-a and @araffin! Any chance you guys are still interested in merging your changes? The amount of work that you have done on improving the code quality is incredible; and we'd love to cooperate instead of competing. If you are, let's reopen this and work on the merge conflicts / divergence points between refactors together. Putting @joschu, @miramurati and @jachiam as principal stakeholders in the loop on this as well.

@araffin
Copy link
Contributor

araffin commented Sep 22, 2018

Hello @pzhokhov,

We are glad you finally give some sign of life ;) (almost two months!)

With @hill-a, we would love those changes to be merged (even though the two codebases have a bit diverged, and it will need some work to fix the conflicts).
However, we would like some clarifications on OpenAI Baselines and the way it is managed, because we don't want to merge the changes and see the code broken again afterward.

One of my first question is a general one: how is OpenAI Baselines managed, what is the roadmap and what is the importance that Openai gives to Baselines ?

From what I understood, you are now the only one that is fully in charge of Baselines (as @unixpickle was some months ago), and the repo comes from different people for each algo (e.g. @joschu for TRPO, @matthiasplappert for HER, ...) which explains the code duplication and the different codestyles.

Another question is that there seems to be an internal Baselines repo that differs from the github version. What are the differences and why is it needed? (I think that is related but could you explain me some weird commits like this one (lots of commit text for almost no changes): e791565)
A bit more transparency would be appreciated, it would avoid regressions (e.g. #344 that was fixed here 4993286 but reintroduced here 9fa8e1b).

Finally, if we merged, we obviously would like to be credited in the README ;)

@jachiam
Copy link

jachiam commented Sep 22, 2018

Hi @araffin and @hill-a!

Chiming in to say hello, introduce myself, and to answer a few of the questions.

Intro: I'm Josh, and I'm a researcher on the safety team at OpenAI! Lately I've been thinking about getting more involved in making sure our public software releases are more user-friendly.

On how Baslines is managed: You're correct that @pzhokhov is the primary owner of Baselines, and that the algorithms were originally implemented by many different people. OpenAI is currently thinking about the long term state of this project (and others), and will hopefully be able to discuss a complete plan in the near future. Since @pzhokhov has been writing the roadmap for Baselines, I'll leave it to him to talk about that.

On the internal version of Baselines: recall that OpenAI is first and foremost a research organization. The internal repo changes rapidly to support research use cases, and includes code from in-progress research projects which aren't ready for release.

Regarding credit: of course we would gratefully acknowledge such contributions. :)

@pzhokhov
Copy link
Collaborator

Hi @araffin and @hill-a ! Thanks for maintaining your interest in the project!
Continuing Josh's idea of personal intros - I am Peter, ML engineer on OpenAI games team.
Regarding questions (simple first :))

  1. Credit - or course

  2. Internal version - motivation is outlined by Josh; I can add that we used to have an entirely separate private repo; and the export / import was a manual process, error-prone as you pointed out; and led to maintenance of baselines being a bit of an afterthought. Now our internal version is a git subrepo fork of baselines, which leads to a much tighter integration, and little chance of accidentally overwriting bug fixes, and keeps the git history correct.

  3. Management - at the moment, I am in charge of baselines, but that's not to say I am the only one who writes code in it - there have been quite a few patches and bugfixes both from internal and external contributors recently. We are also thinking of augmenting baselines-responsible team with more people (@miramurati may be in a better position to comment on that). Your assessment of code duplication reasons is 100% accurate.

  4. Roadmap... Step 1) fix existing algos; step 2) implement new algos; step 3) profit :)
    Speaking seriously though, our roadmap is a bit of a fire-fighting one, and looks more like a loosely sorted laundry list of things (in the current form, it does not take your changes into account).

  • Docs + unit tests

  • Visualization framework - something to easily connect / compare runs; maybe as simple as helper utils for tensorboard (scope is a bit unclear)

  • Run and publish all benchmarks; need to confirm that it's ok legally to publish mujoco-based benchmarks, maybe switch to roboschool

  • Refactor DDPG, HER, GAIL to conform to common API

  • fix DQN hyperparams with Atari - find hyperparameters that work for all games

  • Recurrent model compatibility everywhere, or wherever possible (DQN, TRPO, ACER, ACKTR, HER, GAIL are not compatible right now)

  • Extra touches on ACER - use VecFrameStack instead of internal stacker, make compatible with observations other than stacks of images, unit tests

  • Extra touches on TRPO - compatibility with VecEnvs, remove env.reset(), Model wrapper

  • Extra touches on DQN - remove extra env.reset(), Model wrapper

  • Serialization syntactic sugar (easier ways of saving / loading the model; maybe also with the results of training)

  • More algorithms - Vanilla PG, Soft actor-critic, Soft Q-learning, IMPALA, etc

@araffin
Copy link
Contributor

araffin commented Oct 5, 2018

Hello @pzhokhov,

Thank you for your answer.
It could be a good idea to have a public roadmap (not just written in a closed pull request) with estimated dates if possible.

My main concern is still the internal repo. I understand perfectly the usefulness of a private repo for in-progress research. However, as for now, it impacts the public repo in an unpredictable way: commits messages coming from the subrepo do not reflect the changes and some unsused code (in the public repo) is also added. A solution for that would be to have a proper fork of baselines and do PR from time to time to update public repo.

@jperl jperl mentioned this pull request Oct 15, 2018
@pzhokhov
Copy link
Collaborator

Public roadmap - fair point; we are working on finalizing it (should be done and published in a few days).
As for internal repo - I understand the concern; at the same time there is quite a bit of value of us in having a subrepo fork, as there are several internal repos that depend on baselines and need to be changed in accord. I believe a reasonable solution is push internal changes into a feature branch and make PR's into master - that way the changes are more transparent and have chance of external comments.

@pzhokhov
Copy link
Collaborator

Turns out the roadmap of baselines is a high enough visibility that bosses stepped in and wanted to polish it and make a press-release with it, so cannot publish it before that.

@tomasruizt
Copy link

Hi, @pzhokhov what does that mean in terms of timing? A month delay?

@jrjbertram
Copy link

Coming into this conversation from the outside... as a user of both pieces of software, I have found stable-baselines much more consistent. From a research side, it's great that you guys are publishing the algorithm implementations as it is a great resource to be able to have vetted implementations, especially when the authors of the original papers clarify subtle points in the openai implementations.

From a world-impact perspective, the changes that stable-baselines have made take openai baselines and make it actually useful by taking the "raw" algorithms and making them consistent, approachable, and reusable ... I would argue for openai to consider stable-baselines as the project that openai should contribute to. I'm only half joking... presumably openai's goal is to have gym and baselines eventually take on a life of its own... this seems like a perfect way to do it.

Both teams are doing great work... as a community member would love to find a way for you both to benefit from a strong relationship, as you each bring something valuable to the table!

Back to my corner...

  • J.

@jonas-eschle
Copy link

jonas-eschle commented Jan 3, 2019

Commenting as a user as well:

what OpenAI has done is great: It went for the unification, providing common gyms, providing baselines. The idea and some implementations are great, no question. And don't let the following change that, I'm a huge fan of what you do/did, it is one of those cornerstones that really moves the community forward!

But: The whole repositories are not quite what they should be (talking of gym and baseline)

  • missing unittests
  • basically no (useful) docs (with sentences like "It’s very easy to..." where the actual explanation or an example should be... guys!)
  • coming from a put-together-approach (which may be a good thing at the beginning) instead of starting from a common ground (e.g. baseline implementations)
  • ... which implies things like no common coding standards

stable-baseline has done exactly all that. Baseline is an experimental, you-have-to-figure-it-out-yourself repository while stable-baseline is the stable, user-friendly repository (that OpenAI may aimed for).

My opinion: gym should become more like stable-baseline and stable-baseline should become standalone. It is just so much better to use (yeah, the extra effort is not too much. But it's about the way of approaching the user).

Again: Great compliment for what you've done at OpenAI. But user-friendlyness and the last 20% of stability is what (for me, IMHO, as a user) seems missing and where stable-baseline seems to know what they do.

@federicofontana
Copy link

gym.Env and gym.Space have been trusted and widely used by many people and they are now an industry standard. Have you reconsidered taking the same direction with a gym.Agent? If so, merging openai/baselines with stable-baselines would be a huge step forward.

As of today:

  1. openai/baselines are probably the most reliable collection of implementation of RL agents
  2. there is not a single well established RL framework in pyhton. keras-rl is dead, tensorforce is dying. Maybe only ray/rllib might be your only "competitor".

Considering the excellent reputation of OpenAI, I wouldn't be suprised if openai/baselines because the standard RL framework one day. However, a necessary condition would be to improve the spaghetti code, or at least provide a standard interface to the users.

@araffin @pzhokhov

zhuyifengzju pushed a commit to zhuyifengzju/baselines that referenced this pull request Sep 29, 2019
* Update DQN arguments: add double_q

* Fix CI build

* [ci skip] Update link

* Replace pdf link with arxiv link
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants